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Abstract — Linear Programming (LP) decoding of Low-Density 
Parity-Check (LDPC) codes has attracted much attention in the 
research community in the past few years. LP decoding has 
been derived for binary and nonbinary linear codes. However, 
the most important problem with LP decoding for both binary 
and nonbinary linear codes is that the complexity of standard 
LP solvers such as the simplex algorithm remains prohibitively 
large for codes of moderate to large block length. To address this 
problem, two low-complexity LP (LCLP) decoding algorithms 
for binary linear codes have been proposed by Vontobel and 
Koetter, henceforth called the basic LCLP decoding algorithm 
and the subgradient LCLP decoding algorithm. In this paper, 
we generalize these LCLP decoding algorithms to nonbinary 
linear codes. The computational complexity per iteration of the 
proposed nonbinary LCLP decoding algorithms scales linearly 
with the block length of the code. A modified BCJR algorithm 
for efficient check-node calculations in the nonbinary basic LCLP 
decoding algorithm is also proposed, which has complexity linear 
in the check node degree. Several simulation results are presented 
for nonbinary LDPC codes defined over Z4, GF(4), and GF(8) 
using quaternary phase-shift keying and 8-phase-shift keying, 
respectively, over the AWGN channel. It is shown that for some 
group-structured LDPC codes, the error-correcting performance 
of the nonbinary LCLP decoding algorithms is similar to or 
better than that of the min-sum decoding algorithm. 

Index Terms — Linear programming decoding, nonbinary 
codes, LDPC codes, coordinate-ascent algorithm, subgradient 
algorithm. 



I. Introduction 

Low-Density Parity-Check (LDPC) codes have attracted 
much attention in the research community in the past decade. 
LDPC codes are generally decoded by message-passing iter- 
ative decoding methods such as the sum-product (SP) algo- 
rithm, also known as belief propagation (BP), and the min-sum 
(MS) algorithm, which perform remarkably well at moderate 
SNR levels. However, binary LDPC codes often suffer from an 
error-floor effect in the high-SNR region. Some progress has 
been made in the direction of finite-length analysis of LDPC 
codes and concepts such as stopping sets [1], trapping sets [2], 
graph-cover pseudocodewords [3], etc., were introduced and 
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investigated to understand the behavior of the SP algorithm in 
the error-floor region. Nevertheless, finite-length analysis of 
LDPC codes under the SP algorithm is a difficult task. 

The main focus of research in the area of LDPC codes 
has been on binary LDPC codes. However, it is desirable 
to use nonbinary LDPC codes in many applications where 
bandwidth efficient higher order (i.e., nonbinary) modulation 
schemes are used. Nonbinary LDPC codes are also considered 
for storage applications [4]. Nonbinary LDPC codes and 
the corresponding nonbinary SP algorithm were investigated 
by Davey and MacKay in [5], and since then many code 
construction methods and optimized nonbinary SP algorithms 
have been proposed. However, the finite-length analysis of 
nonbinary LDPC codes under the nonbinary SP algorithm is 
also difficult and attempts in this direction (e.g. [6]) have been 
few. 

An alternative decoding algorithm for binary LDPC codes, 
known as linear programming (LP) decodingj, was proposed 
by Feldman et al. in [7], [8]. In LP decoding, the ML decoding 
problem is modeled as an integer programming (IP) problem 
which is then relaxed to obtain the corresponding LP problem. 
This LP problem is solved with the help of standard LP solvers 
based on the simplex algorithm or interior-point methods. 
Compared to SP decoding, LP decoding relies on the well- 
studied mathematical theory of LP. Hence, LP decoding is 
better suited to mathematical analysis and it is possible to 
make statements about its complexity and convergence, as 
well as to place bounds on its error-correcting performance. 
However, the worst-case time complexity of the LP solvers 
based on the simplex method is known to be exponential in 
the description complexity, and with other LP solvers based on 
interior-point methods the corresponding worst-case time com- 
plexity is polynomial. On the other hand, iterative decoding 
algorithms such as SP algorithm have time complexity linear in 
the block length of the code and hence significantly outperform 
LP decoding algorithms based on simplex or interior-point 
methods in terms of efficiency. 

To overcome the complexity problem, several improved LP 
decoding algorithms are proposed in [9], [10], [11], [12], [13], 
etc. In [9] and [14], the authors use techniques from LP and 
coding theory to derive two low-complexity LP (LCLP) de- 
coding algorithms, namely the basic LCLP decoding algorithm 
and the subgradient LCLP decoding algorithm, which can be 
used for approximate LP decoding of binary LDPC codes. The 

'In this paper, the acronym LP stands for linear programming or linear 
program, depending on the context. 
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basic and subgradient LCLP decoding algorithms rely on the 
block-coordinate ascent method (also known as the nonlinear 
Gauss-Seidel method) [15] and the incremental subgradient 
algorithm [16], respectively, to obtain a solution to the LP 
problem proposed in [8]. Also, the variable node (VN) and 
check node (CN) calculations of the basic LCLP decoding 
algorithm are directly related to VN and CN calculations of the 
binary SP algorithm; hence the complexity of each iteration of 
the basic LCLP decoding algorithm is similar to that of the SP 
algorithm. The complexity of each iteration of the subgradient 
LCLP decoding algorithm is similar to that of the min-sum 
algorithm. An algorithm similar to the basic LCLP decoding 
algorithm for more general graphical models was proposed in 
[17]. An extension of the basic LCLP decoding algorithm was 
proposed and studied in [18]. 

In [19], LP decoding was extended from binary linear 
codes to nonbinary linear codes. Nonbinary LP decoding, as 
presented in [19], relies on standard LP solvers based on 
simplex or interior-point methods, and hence standard iterative 
decoding algorithms such as the nonbinary SP algorithm sig- 
nificantly outperform these nonbinary LP decoding algorithms 
in terms of computational complexity. In independent work 
[20], [21], the authors proposed a new scheduling scheme for 
the nonbinary basic LCLP decoding algorithm which extends 
the low-complexity LP decoding method of [18] to nonbinary 
codes. 

In this paper we extend the work of [9], [14] to nonbinary 
linear codes and propose the nonbinary basic and subgradient 
LCLP decoding algorithms. We use the LP formulation of 
nonbinary linear codes proposed in [19] to develop an equiva- 
lent primal LP formulation. Then, using the the techniques 
introduced in [22] and [23], the corresponding dual LP is 
derived which in turn is used to develop update equations for 
nonbinary LCLP decoding algorithms. The complexity of the 
proposed nonbinary LCLP decoding algorithms per iteration 
is linear in the code's block length. In contrast to binary basic 
LCLP decoding, the VN and CN calculations of nonbinary 
basic LCLP decoding are not directly related to nonbinary 
SP. Therefore, without the use of an efficient CN processing 
algorithm, the complexity of the CN calculations will be 
exponential in the maximum CN degree. To overcome this 
problem, we propose a modified BCJR algorithm for efficient 
CN processing which has complexity linear in the CN degree 
and allows for efficient implementation of nonbinary basic 
LCLP decoding. We also propose an alternative state metric 
which can be used for faster CN processing. 

The remainder of the paper is structured as follows. We 
begin with some notation and background in Section II. The 
primal LP is developed in Section III and the corresponding 
dual LP is given in Section IV. Section V presents the 
nonbinary basic LCLP decoding algorithm, and reduced com- 
plexity CN processing is presented in Section VI. Section VII 
outlines the nonbinary subgradient LCLP decoding algorithm. 
Simulation results are presented and discussed in Section VIII. 

II. NOTATION AND BACKGROUND 

The symbols R, R>o, and Z>o denote the field of real 
numbers, the set of positive real numbers, and the set of 



positive integer numbers, respectively. Let 5ft be a finite ring 
with q elements where and 1 denote the additive and 
multiplicative identity, respectively, and let 5ft~ = 5ft \ {0}. 
Let C be a linear code of length n over the ring 5ft, defined 
by C = {c G 5ft" : cH T — 0} where V. is an m x n 
parity-check matrix with entries from 5ft. The code C has rat^l 
R(C) = log g (|C|)/?i and is referred to as an [n, log q (|C|)] 
linear code over 5ft. 

The set J = {1, . . . ,m} denotes row indices and the set 
X = {1, ...,n.} denotes column indices of T-L. We use T-Lj 
for the j-th row of H and W for the i-th column of Ti. The 
support of the vector c is denoted by supp(c). For each j G J, 
let Ij = supp(Hj) and for each i e 1, let Ji = supp("H l ). 
Also let dj = \Ij\ and d = m&Xj e j{dj}. We define the set 
£ = elxj : j G J,i G Ij} = elxj: 

i G Ji). Moreover, for each j G J, we define the local 

single parity check (SPC) code Bj = {{b l ). ieTj : Yh<zX- h ' 
Hjj = 0}. For each i G 1, we denote by A, C 5ftl{°} u ^ the 
repetition code of the appropriate length and indexing. We also 
use the following notation introduced in [9]: for a statement 
A we have [A] = if A is true and \A\ = +oo otherwise. 
As in [19], we define the mapping 

£ : 5ft ^ {O,!} 9 - 1 cR 9 " 1 

by 

£(r) = x = (») peK - 
such that for each p G 5ft - 

x (p) = { 1 tf P = r 
1 otherwise. 

Building on this we define 

H: U 5ft* ^ U {0,l} (rl)( C U R(«- 1 )* , 

according to 3(c) = (£(ci), . . .,£(ct)), Vc G 5ft*, < G Z >0 . 
For vectors / G R^ 9 " 1 )™ we use the notation / = 

{fi\f 2 \ ■■■ I /„) where V* G X, /, = (ft^reU- • ' 

We also define the inverse of H as H 1 (f) = 
(C 1 (fi),C X (f 2 ), ■ ■ ■ ,C X (fn))- Note that the inverse of S 
is well defined for any / G R^ 9-1 )" where each component 
f { , i G I, has entries from {0, 1} with sum at most 1. 

We assume transmission over a q-ary input memoryless 
channel whose input alphabet is identified with 5ft, and whose 
output alphabet is denoted by S. The received vector is 
denoted by y = (yi,y2, ■ ■ ■ ,Vn) £ S n . Based on this, for 
each i G I we define a vector = (A,- r '') J , e sR- where, for 
each y G S, r G 5ft~, \<f ] = log («) . Here p(y\c) 
denotes the channel output probability (density) conditioned 
on the channel input. Based on this, we also define A = 
(Ai | A 2 |...| A„). 

For k G R>o, we define the function ijj(x) = e KX , and its 
inverse ^ 1 (x) — ilog(a;). We will use Forney-style factor 
graphs (FFGs), also known as normal factor graphs [24] to 

2 The code rate is defined as the ratio of the number of information symbols 
to the number of coded symbols. Note that in general, for a code over a ring 
5R, the code rate may not in general be expressed in terms of the rank of W 
(since H may contain non-invertible elements). 
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represent the linear programs introduced in this paper. An FFG 
is a diagram that represents the factorization of a function of 
several variables. For more information on FFGs the reader is 
referred to [24], [22], [25]. 

III. The Primal Linear Program 

In [19] the authors presented the following linear program 
to decode nonbinary linear codes: 



NBLPD: 



mm. 



subj. to 



if' = E w ^ v -? e J, V* e X h Vr e a?" , 



beB 3 

bi —r 



Wj,b > 

E w i> b = 1 
bee, 



Vj e J, Vb e Bj , 
Vj e J . 



We denote the polytope represented by the variables and 
constraints of NBLPD as Qf. Two alternative polytope rep- 
resentations are also given in [19], which are both equivalent 
to NBLPD. It is also possible to reformulate the constraints 
of NBLPD with additional auxiliary variables. However, to 
develop a low-complexity LP decoding algorithm for NBLPD, 
we use the approach of [9] and reformulate NBLPD so that the 
new LP formulation can be directly represented by an FFG: 

PNBLPD: 



min. 


iez 




subj. to 






fi 


= u%,o 


(iel), 




= 


((i,i)6f), 


E ai >°- H ( a ) 


= Ui 


(iel), 








E h w 


= V J 


(J e J) , 


beBj 






Oii,a 


> o 


(i el, a G A) 




> o 


(jeJ,beB 3 ) 


E ai ><* 


= 1 


(iel), 








E fa 


= 1 


(J e J) . 



bEBj 



Here we introduce the definitions Ujj = (itj ^ ) r eSR- and 
Wj.i = («$)reR- for all iel,j£ JjU {0}. We also 
define = (wij) J - e> 7 i u{0} for i 6 I, and Vj = (v jA ) ieIj for 
j G J7. We denote the polytope represented by the variables 
and constraints of PNBLPD by Q p . It is important to note 
that along with the convex hulls of the single parity-check 
codes, PNBLPD also explicitly models the convex hulls of 
the repetition codes. The constraints of NBLPD and PNBLPD 
appear to be quite different due to the different notations. 



However, the projection of each polytope onto the variables 
denoted by / is the same in both cases, and therefore the LPs 
are equivalent from the point of view of decoding. The proof 
of their equivalence is given in Theorem 13.11 

Theorem 3.1: Polytopes Qf and Q p are equivalent from an 
LP decoding perspective, i.e., for every (f,a,(3) G Q p there 
exists a w such that (f,w) G Qi, and conversely, for every 
(f,w) G Qf there exist a, (3 such that (/, a.,/3) G <2 p . 

Proof: The proof of Theorem 13.11 can be found in [26]. 

■ 

Before deriving the dual linear program, we reformulate 
PNBLPD so that this LP can be represented by an FFG. For 
this purpose, the constraints of PNBLPD are expressed as 
additive cost terms (also known as penalty terms). The rule 
for assigning a cost to a configuration of variables is: if a 
given configuration satisfies the LP constraints then cost is 
assigned to this configuration, otherwise +oo is assigned. The 
PNBLPD is then equivalent to the unconstrained minimization 
of the augmented cost function 



E A ^+E^= M *.°J+ E 

»ez (i,j)es 

+E^(«*) + E^(^) ' 



(1) 



where Vi G X and Vj G J we have defined 



Ai{ui) 



E ai <°- H ( a ) 

x6.Ai 

E °- ia = 1 



E > 0] 



a£Ai 



LaeAi 



E h w 

bGBj 

E = i 

b£B, 



= v, 



E IP* * °i 

beB 3 



For ease of illustration we consider a (5, 2) code over Z4 with 
parity-check matrix 



% = 



13 10 
10 10 
3 1 



The augmented cost function for this code is represented by 
the FFG of Figure Q] 

IV. Dual Linear Program 

In this section we derive a dual LP for PNBLPD. As 
shown in subsequent sections, the dual LP is useful for the 
development of the nonbinary LCLP decoding algorithms. 

The dual LP of PNBLPD can be derived from the aug- 
mented cost function of (Q]). First we derive the duals of 
Ai(ui) and Bj(vj). For simplicity of exposition, we assume 
Ai = {ai,a 2 } = {(a1,a\,af), (a^a^af)}. The (primal) 
FFG of Ai(ui) is shown in Figure [3] and its dual is shown in 
Figure [4] The dual FFG is derived with the help of techniques 
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A5/ 5 T A * 

Fig. 1. FFG which represents the augmented cost function of (TJ for 
the example (5, 2) binary code. 




-ih = hi A 5 



Fig. 2. FFG which represents the augmented cost function of {5J for 
the example (5,2) binary code. Here a function node which is marked 
with the symbol ~, and which is connected to edges u and v, denotes 
the function — [li = —vj. 



introduced in [22] and [23]. The dual function Ai(ui) is 
obtained from the FFG of Figure [4] as follows, 



DNBLPD: 



where 



Ai(ui) = fa - [a ii0l < 0] - [a«, a2 < 0] 
&i,ai = ~4>i + {-Ui,S(ai)) 

=> - [a»,ai > 0] = - fa < (-Ui,S(oi)) 



(2) 



(3) 



Similarly 

- [o^a 2 > 0] = - < {-u t ,3(a 2 )) 
From ©, (01, and © we obtain 

li(tti) = fa - Ifa < (-fii,S(oi))] - [<& < (-Wj,3(a 2 



(4) 



Ai(Wj) = ^ - 



< min (— ttj, 3(a)) 



The same procedure can be used to derive the dual of 

Bj(vj) as 



0j < min (-Vj, 3(b)) 



We use A,* (it,*) and Bj(vj) to derive the dual of the LP 
represented by (01, which is in the form of the maximization 
of the following augmented cost function, 



E M^i) + E - £[£ = -**.ol 

iex jej" iei 

E = _ E = x ' 



(5) 



The dual of PNBLPD can now be obtained from (0. 



max. 
subj. to 



& < min {—Ui, 3(a)) (i £ I) , 

aeAi 

9j < min {-vj, 3(b)) (j & J) , 



Uifi = - fi 

fi = K 



(i e I) , 

(i G 2) . 



The augmented cost function of (0 for the (5, 2) binary code 
is represented by the FFG of Figure [2] 

We make use of the soft-minimum operator introduced in [9] 
and derive the softened dual linear program. For any k £ K>o, 
the soft-minimum operator is defined as 



min ( K '{zi} = log 



V I 



-v,- 1 (i>h)- 



Note that min; ^{z{\ < min;{z;}, with equality attained in 
the limit as n — > oo. With this we define the softened dual 
linear program SDNBLPD which is the same as DNBLPD 
except that min is replaced by min ^ . 

V. Nonbinary Basic Low-Complexity Linear 
Programming Decoding Algorithm 

As mentioned earlier, the basic LCLP decoding algorithm 
proposed in [9] is a block-coordinate ascent type algorithm. 
The block-coordinate ascent algorithm iteratively finds the 
optimum of a given continuously differentiable function. Each 
iteration of the block-coordinate ascent algorithm consists of 
multiple steps and during each step, a block of variables is 
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[e>0] 



ii } p 



€(po) 



€(9o) 



ii 7 q 



b>o] 



[e<o] 



-€(?o) 



€(w) 



7t,p 



-«(pi) 



«(P2) 



~T" », 



«(92) 



-S(P2) ' 



-Eh 



-EH 



-B- 



-B- 



i& = i] 



Fig. 3. FFG for the function Aj(uj). This forms a subgraph of the 
overall FFG of Figure [T] 



Fig. 4. FFG for the function Ai(ui). This FFG is dual to that of 
Figure [3] Here, for any primal variable x, the dual variable is denoted 
by x. 



updated so that the given function is optimized with respect to 
them, while at the same time the rest of the variables are kept 
constant. An iteration of the block-coordinate ascent algorithm 
is completed when all variables are updated. 

In this section, we derive the nonbinary basic LCLP de- 
coding algorithm. For this, it is important to observe from 
SDNBLPD that the variables Uij and iiji are coupled with 
each other, i.e., we always have Ujj = —Vj i for all G £■ 

It can be observed that in SDNBLPD, <j>i and 6j are each 
involved in only one inequality and hence we can replace 
these inequalities with equality without changing the optimal 
solution (the same is true of DNBLPD). With this, let us select 
an edge G £ and assume that the variables associated to 
the rest of the edges are kept constant; then optimizing the cost 
of SDNBLPD with respect to Ujj is equivalent to optimizing 
h (ui.j), where 

h {Uij) = min (K) (-Hi, 5(a)) + min (k) (-fy, E(6)) . (6) 

Although the soft-minimum operator is an approximation of 
the minimum operator, its advantage lies in ensuring the 
convexity and differentiability of the function h(ilij) in (O, 
which makes possible the proofs of Lemmas 15.11 and 15.21 
described below. 

If the current values of the variables Uij,(f>i,6j related to 
the edge (i,j) G £ are replaced with the new values (at the 
same time keeping variables related to other edges constant) 
such that h(ui t j) is maximized, then we can guarantee that 
the dual function also increases or else remains constant at 
its current value. The new value u*;- for each uf) , r G 5ft~, 
which maximizes h(ui,j) is given by 

u*y = argmax h (uj ,•) . (7) 



Once we have calculated tt| •, we can update the variables 
(j>i and Oj accordingly. The calculation of u* ^ is given in the 
following lemma. 

Lemma 5.1: For any r G ft- the value of u*^ ] of © can 
be calculated using 

Wjj = — {(Vi.f — ^»,r) ~ (Cj.f — Cj, r )) , 

where 



Vi, f 


_A 


- min ^ 


( 


~Ui, 


3(a)) , 


V i>r 


_A 


- min (k) 

aeAi 
aj—r 


( 


-Ui, 


3(a)) , 






— min ^ 

beBj 

bi^r 


( 


-Vj, 


3(b)), 


Cj >r 




- min 

b£Bj 
hi —r 




-Vj, 


3(b)). 



Here the vectors iii and a are the vectors and a, respec- 
tively, where the j-th position is excluded. Similarly, vectors 
Vj and b are obtained by excluding the i-th position from Vj 
and b, respectively. 

Proof: The proof of Lemma |5TI can be found in [26]. ■ 

Lemma 15.11 is a generalization of Lemma 3 of [9] to the case 
of nonbinary codes. One visible difference between the binary 
case and the present generalization is in the calculation of V^f 
and Cj,f ■ Here in the case of nonbinary codes, the calculation 
of Vi.f does not exclude the j-th entry from a G A% and Uij', 
similarly, the calculation of Cj^ does not exclude the i-th 
entry from b G Bj and vj^. Note that this is not inconsistent 
since uf) is never used to update itself. Here the calculation 
of Vi t f and Cj? requires f G 5ft \ {0,r} and hence £(f) is 
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(f) 

always multiplied with the corresponding u\ ■ . This ensures 
that uf} is not used for calculating u*j. 

As mentioned in [9], the update equation given in Lemma 3 
of [9] can be efficiently computed with the help of the variable 
and check node calculations of the (binary) SP algorithm. Due 
to this, the complexity of computing {Cj,f ~ is 0(d) for 
binary codes. On the other hand, in the case of nonbinary codes 
the mapping S used in NBLPD transforms the nonbinary 
linear codes Ai (repetition code) and Bj (SPC code) into 
nonlinear binary codes Af h = {2(a) : Va £ Ai} and 
B^ h = {3(b) : Vb £ Bj}, respectively. Here, the computation 
of [Vi.f — Vi.r) and (Cj t f — Cj <r ) is related to the SP decoding 
of nonlinear binary codes Af L and B^ h . If Ai and Bj have 
equal lengths then they are duals of each other; however, the 
relationship between Af L and £>^ L is not so simple. 

One option to compute (Cj f — Cj r ) is by going through all 
possible codewords of the SPC code Bj exhaustively. In this 
case the complexity of computing (Cj.f — Cj,r) is 0(dq t " d ~ 1 ^). 
Another possibility is to use the trellis of the nonbinary SPC 
code to calculate these values. In Section |VI] we prove that 
the computation of Cj^ and Cj. r can be carried out with 
complexity linear in the check node degree by using a trellis- 
based variant of the SP algorithm. 

Before we come to that section, we formulate the complete 
decoding algorithm which uses the update equation given in 
Lemma [5TTI We select an edge £ £ and calculate iiij 

from Lemma 15.11 Then 9j and the objective function 
are updated accordingly. One iteration is completed when all 
edges £ £ are updated cyclically. This is a coordinate- 

ascent type algorithm and its convergence may be proved in 
the same manner as in Lemma 4 of [9]. 

Lemma 5.2: Assume that d > 3 for a given parity-check 
matrix H of the code C. If we update all edges £ £ 

cyclically with the update equation given in Lemma 15.11 
then the objective function of SDNBLPD converges to its 
maximum. 

Proof: The proof is essentially the same as that of Lemma 
4 of [9]. ■ 

The algorithm terminates after a fixed number of itera- 
tions or when it finds a codeword. Knowing the solution of 
SDNBLPD does not give an estimate of the codeword directly. 
However, an estimate of the i-th symbol Cj can be obtained 
from the vector us. For this we define 



*(r) A 



A 



(r) 







if r = 



Let M.i = argmin re s|j{a; l -'' > }. If AI; contains a single element 
r* , then the symbol estimate is obtained as c, — r*\ otherwise, 
we mark Cj as erased. 

Due to the soft-minimum operator, the function h(uij) 
in © is differentiable everywhere and this fact is used in 
Lemma 15.11 to obtain the update equations. However, for 
practical implementations we are interested in k — >• oo. As 
mentioned earlier, in the limit k — > oo, the soft-minimum 
operator becomes the minimum operator, which requires less 
computation. The following lemma considers n — > oo. 



oo, the function h(Hij) is 
maximized by any value u) 1 ■ that lies in the closed interval 
between 



Lemma 5.3: In the limit k 



(Vi,r 



Vi,r) 



and — (Cj : f ~ Cj.r) 



where 



Vtf = - min (-Ui,B(a)) 

aeAi 

Vi.r — - min (-•&;, 3(a)) 

aeAi 

ftn — r 



Cj r — — min (— v*, 3(6)) 
Ci r = — min(— Vj, 3(6)). 

bi—r 



Proof: The proof of the lemma is a generalization of 
Lemma 5 of [9]. ■ 

Now we can update edges G £ cyclically where iiij 

is calculated according to Lemma 1531 However, in this case, 
we cannot guarantee convergence of the algorithm. This is 
because for k — > oo the objective function is not everywhere 
differentiable and it is not possible to use the same argument 
as in Lemma IBT21 This problem is also discussed in Conjecture 
6 of [9]. After the algorithm terminates, the decision rule 
described above can be used to obtain each symbol estimate 

Cj. 

The nonbinary basic LCLP decoding algorithm of 
Lemma lBTTI updates a single variable related to an edge (i, j) £ 
£ at a time. However, we observed from our simulation work 
that updating all variables related to an edge G £ 

simultaneously and processing each edge G £ one at 

a time, does not effect the convergence or the error-correcting 
performance of the nonbinary basic LCLP decoding algorithm. 
It is also possible to solve NBLPD by varying all the edge 
variables related to a VN i G X or a CN j £ J simultaneously. 
Such a variant for basic LCLP algorithm was proposed in [27]. 
We extended the work of [27] to nonbinary codes for the case 
in which all the edge variables related to a VN i G T are 
updated simultaneously. Details about this case can be found 
in [28]. For the other case in which all edge variables related 
to a CN j £ J are updated simultaneously, we remark that 
the approach of [27] cannot be used with the nonbinary basic 
LCLP decoding algorithm [28]. 

VI. Modified BCJR Algorithm for Check Node 
Calculation 

In this section we propose a modified BCJR algorithm 
which allows for efficient implementation of the nonbinary 
basic LCLP decoding algorithm. We observe that the equations 
for Cjj? and Cj :T can be rewritten as follows: 



= E ^((^,3(6))) , 

beBj 
bi —r 

= E ^(<%,3(&))) • 



(8) 



(9) 



bi^r 



It may be observed from the above equations that the cal- 
culation of Cj tr and Cj ? is in the form of the marginalization 
of a product of functions. Hence it is possible to compute 
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Cj. r and Cj t f with the help of a trellis-based variant of the SP 
algorithm (i.e., BCJR-type algorithm). One possibility is to use 
the trellis of the binary nonlinear code Cj* h — {3(b) : Vb G 
Bj}. However, due to nonlinear nature of this binary code, the 
state complexity at the center of its trellis would be exponential 
in dj. Here state merging is also not possible. Hence there is 
no complexity advantage when we use the trellis of the binary 
nonlinear code Cj^ L . 

However if the trellis for the nonbinary SPC code Bj is 
used, then the state complexity at each trellis step is 0(q) and 
is independent of dj. The branch complexity of this trellis is 
0(q 2 ). In the following, we prove that the marginals C } ■, f and 
Cj >r can be efficiently calculated with some modifications to 
the BCJR algorithm which uses the trellis of the nonbinary 
code Bj. 

For ease of exposition, we will assume here that X, = 
{0, 1, . . . , dj — 1}, and let Hj% = hi for i 6 Tj. We then 
define the following for the trellis of the SPC code Bj: 

1) The set of all states at time i is given by <S,-,i G 
{0, . . . , dj}. Also here 5^ = 3? for every i. 

2) There is a branch joining s G <S, and s' G <Sj+i for 
every symbol 6j satisfying s' — s = hfii (if no such 
symbols bi exist, there is no such trellis branch). For 
such a symbol bi, the 'branch metric' is given by g(bi) = 

3) We define (7(11,12) = Y^i=Ti ^ih for b G Bj. In the 
trellis for the SPC code, each state s G Si represents the 
'partial syndrome' <j(0,i — 1). 

4) The state metric for forward recursion is 

i-l 

Mi(s)= liaih), s£S u ielj (10) 

(fe 0l ...,6 i _ 1 )t=0 
a{0,i— l)=s 

with fj, (0) = l,fM)(r) = 0,Vr G Similarly, the 
state metric for backward recursion is 

dj-l 

"i(*)= Y, II 9(h), seSuiGlj (11) 
^,...,6^-1) t=< 

<r{i,dj — l) — s 

with i/^(0) - 1, i/ d ,(r) = 0, W e 3T . 
Lemma 6.1: Cj :T and C^f can be efficiently computed on 
the trellis of the nonbinary code Bj as follows, 

ip\Cj t rJ = A*i(s) ■ f»+i(s + h»&t) • , 

sGS; 6i£i?\{r} 

(12) 

where state metrics fii and ^+1 are calculated recursively from 
previous state metrics via 

Hi{s) = Y /"i-i( s_ hi-ibi-x) ■ g(bi-i) , 
b,e?R 



Proof: The proof of Lemma |6TI can be found in [29] for 
the case where all of the coefficients Hj.i (for i G Ij) are set 
to equal the ring's multiplicative identity 1; extension of the 
proof to handle arbitrary coefficients is straightforward. ■ 

Here the CN calculations are carried out in two phases: 
in the first phase, the forward and backward state metrics 
are calculated and stored; in the second phase the marginals 
Cj >r and Cj.f are computed according to Lemma [67T1 where 
the state metrics computed in first phase are utilized. It may 
be observed that the aforementioned algorithm is essentially 
the same as the BCJR algorithm except for the second phase 
where marginals are calculated. Note that in general the trellis 
may contain parallel branches, since some of the entries of 
the parity-check matrix may be non-invertible elements of the 
ring. 

A. Alternative State Metric for Faster Calculation of Cj yf 

The forward state metric n as defined in ( TTOb needs to be 
computed for the calculation of Cj. r and can be reused for the 
calculation of Cj t f. In (fT2l the algorithm needs to go through 
all branches (s, s') G Si xSi+i, s' — s 7^ hir for the calculation 
of Cj f. If the proposed algorithm is implemented in hardware 
or on multicore architectures, then the computation time for 
Cj.f can be reduced by parallelizing its calculation. One 
possibility to parallelize the calculation of Cj.f is to define a 
new forward state metric p, which can be computed in parallel 
with fi in the first phase and reduces the calculations required 
during the second phase of the algorithm. For this we define 
an alternative forward state metric as follows, 

i-l 

M*( s ; r ')= Y Y[9(bt), s G Si,i elj,r G 

(6 ,...,&,_i) t=0 
cr(0,i-l)=s,6j_i/r 

(14) 

with fio(s, r) = 0, Vs G S , VreJf". It should be noted that 
due to the condition ^ r, /2j(s,r) cannot be calculated 
recursively from /2i_i; instead it is calculated together with 
Hi{s) from ^tj_i as follows, 

fii(a,r) = Y l*i-i{s -^h) ■ g(bi). 

bim\{r} 

With the help of the alternative forward state metric given in 
dT4b . the expression ( fl2b of Lemma 16.11 can be rewritten as 

ip( c 3,r)= fii+i{s',r)-v i+1 (s'). (15) 

s'eSi + i 

The forward state metric /i,(s, r) requires the calculation 
and storage of an additional q — 1 values for each state s G 
<Sj during the first phase. Hence the storage requirement for 
the calculation of Cj.f with (15[ increases by a factor of q. 
However, all additional state metric values can be calculated in 
parallel with /1 which does not effect the run time of the first 
phase of the algorithm. Also, the second phase of the algorithm 
needs to go through only q states instead of q(q— 1) branches, 
hence the overall run time for computing Cj. f is reduced with 
the state metric p,. 
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B. Calculation of Marginals with n — > oo 

In Lemma 16.11 K is assumed to be finite. However, for 
many practical applications we are interested in k — > oo. 
According to Lemma 15.31 for k 4 oo we again need to 
calculate {Cj, r — Cj, f ) to update the corresponding variables. 
However, the marginals Cj >r and Cj^ are here obtained as the 
limit k — ^ oo of ([HJ and (|9), respectively, i.e., 



Cj :r = — min 



, Vj,S(b)}, Cj f = - min (-■ O i ,S(b)) 
bee, J J ' bets 3 J 

bi—r bi^r 



(16) 

Thus Cj. r and Cj. f can be obtained by replacing all "product" 
operations with "sum" operations and similarly by replacing 
all "sum" operations with "min" operations in (0 and (0 
(marginals with finite k). In (O and (0 the marginalization is 
performed in the sum-product semiring. However for k — >• oo 
the marginalization is performed in the min-sum semiring 
and hence the marginals of ( fl6l l can be computed with a 
trellis-based variant of the MS algorithm. If we redefine the 
branch metric as gipi) = (vj,i, £(bi)) and replace all "product" 
operations with "sum" operations and similarly replace all 
"sum" operations with "min" operations in ( [Tol l. ( fTTT l, ( fl2] >. 
(fT3l >, ( fl4l > and (15[ then the resulting equations can be used 
on the trellis of the nonbinary SPC code Bj to compute 
the marginals of JTSI l. This trellis-based variant of the MS 
algorithm is related to the Viterbi algorithm. 

VII. Nonbinary Subgradient Low-Complexity LP 
Decoding Algorithm 

In [9] the authors proposed the subgradient LCLP decoding 
algorithm for binary LDPC codes. The objective function of 
the dual LP (denoted DLPD2 in [9]) can be expressed as a sum 
of several component functions. Based on this observation, the 
authors proposed the use of incremental subgradient methods 
[16] for the maximization of the dual objective function in 
DLPD2. 

The main idea behind incremental subgradient methods is to 
process each component function separately where variables 
related to the selected component function are updated imme- 
diately. An iteration of the incremental subgradient method can 
be seen as a sequence within which each component function 
is processed exactly once [16]. 

Similar to the dual LP DLPD of [9], the objective func- 
tion (which is concave but not everywhere differentiable) of 
DNBLPD can also be expressed as the sum of component 
functions. Hence it is also possible to use incremental sub- 
gradient methods to find the solution of DNBLPD. As in the 
previous section, we assume Uij = —Vj t i for all G £. 

To develop the nonbinary subgradient LCLP decoding algo- 
rithm, we consider the component function given by the term 
in the objective function related to CN j 6 J ", i.e., 



m j (%) = min {-tin, 5(b)) 



(17) 



We provide the definition of the subgradient for this part of 
the objective function in the following lemma. 



Lemma 7.1: For the term in the objective function related 
to the CN j e J given in dl7l >, a subgradient is given by 



s j(vj) = -S argmin{-%, S(6)) . 

V b ee 3 J 

Proof: For Sj(vj) to be a subgradient of rrij{vj), the 
following inequality must hold [16] 

rrij (v'j ) < iTij (vj ) + (sj (vj ),v' j -v j ) (18) 



for all v'j G R^- 1 )^!, We define 



b' = argmin(— vj, 2(6)). 



(19) 



With this, we obtain 

mj(Vj) + (sj(vj), v'j - Vj) 

= min(-%> 3 (&)) + (sj(vj), v' 6 ) - (s 3 (v), v 3 ) 
foes, 

= (- Vj , S(6')> + (-3(6'), «J) - <- H ( 6 '), *j) 

= {-v'j, S(6')> 

> mui(-V., S(6)> 
fees/ J 

= m A v 'j) » 

thereby proving (fLSl l and the fact that Sj (vj ) is a subgradient 

of rrij (vj ) . 

Note that if more than one vector b G Bj achieves the 
minimum in (fT9l . a subgradient is given by the negative of 
an arbitrary linear combination of the corresponding vectors 
3(6). ■ 

The subgradient of the above lemma is denoted by S=. It can 
be observed that the subgradient S=, which is a generalization 
of the subgradient given in [9] for binary codes, can be 
efficiently obtained with the help of the Viterbi algorithm on 
the trellis of the nonbinary SPC code Bj. Once the subgradient 
is obtained, the dual variable Vj can be updated as [16] 



t) 



(Vj), 



(20) 



where i?; G K>o is the step size at iteration I. The dual 
variable iii related to VN i G I can be updated in an 
analogous manner. The subgradient for the VN update can 
be computed with some modifications to the VN calculations 
used in the nonbinary SP algorithm. One iteration of the 
algorithm is completed when all check-node-related updates of 
dual variables Vj, j G J, and then, all variable-node-related 
updates of dual variables Ui, i G X, have been (sequentially) 
performed. The convergence of this algorithm is guaranteed 
for a suitably chosen step size sequence {i?;};>i [16]. The 
decision rule to obtain the estimate of the symbols from the 
dual variables Ui, i G I, is the same as the one given in 
Section |V] 

The choice of step size sequence {i9;}z>i can also affect 
the convergence as well as the error-correcting performance 
of the algorithm. Some possible step size rules (e.g., constant, 
diminishing, dynamic etc.) are discussed in [16]. It was deter- 
mined through extensive simulation work that the following 
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Fig. 5. Frame/symbol error rate for the (155, 64) quaternary LDPC code 
under QPSK modulation. 
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Fig. 6. Average number of iterations required to converge for the (155, 64) 
quaternary LDPC code under QPSK modulation. 



staircase type step size rule works best for most nonbinary 
LDPC codes (independent of the code parameters): 



01- 
01- 



x 0.8 if I is divisible by 20 
otherwise. 



The initial value i9j is also determined by the simulation. 

The nonbinary basic LCLP decoding algorithm is an edge- 
by-edge algorithm, i.e., it processes each edge in the Tanner 
graph separately. During the decoding of regular LDPC codes 
with the nonbinary basic LCLP decoding algorithm, the mod- 
ified BCJR algorithm of Lemma 16.11 is utilized m ■ d c times, 
and VN calculations are carried out n ■ d v times, in a single 
iteration. In contrast to this, the nonbinary subgradient LCLP 
decoding algorithm works on a node-by-node basis, i.e., it 
updates all the edges related to a CN or a VN simultaneously. 
Hence the nonbinary subgradient algorithm runs the Viterbi 
algorithm only m times, and performs the VN calculations n 
times, in a single iteration. Also, the Viterbi algorithm is com- 
putationally less expensive then the modified BCJR algorithm 
used in the nonbinary basic LCLP decoding algorithm. This 
reduces the complexity of a single iteration of the nonbinary 
subgradient algorithm significantly, and as a result it is similar 
to that of the MS algorithm. One more advantage of the 
nonbinary subgradient LCLP decoding algorithm is the ease 
of computation of the dual function value (the contribution 
of the component function given in Lemma 17.11 towards the 
global function is computed by the Viterbi algorithm in the 
form of the forward state metric). Similarly, the component 
function value is also output as a by-product of the VN 
computations. Hence the global function value can be easily 
computed during each iteration. The algorithm may be deemed 
to have converged to the solution of DNBLPD when the 
difference between the global function values computed during 
successive iterations is close to zero; this criterion may be 
used to efficiently implement an early stopping mechanism. 
The global function value computed during each iteration can 
also be utilized to adapt the step-size dynamically to improve 



the convergence and/or error-correcting performance of the 
nonbinary subgradient LCLP decoding algorithm. 

VIII. Simulation Results 

This section presents simulation results for the nonbinary 
basic and subgradient LCLP decoding algorithms. We use a 
cyclic edge-update schedule for the nonbinary basic LCLP 
decoding algorithm. The nonbinary basic LCLP decoding 
algorithm uses the trellis-based CN calculations described in 
Section [VI] and we consider n oo for all simulations. The 
MS and SP algorithms also use the trellis of the nonbinary 
SPC code for CN processing. We use the binary (204, 102) 
MacKay LDPC matrix and the (155,64), (755,334), and 
(1055,424) group-structured LDPC matrices from [30], but 
with nonzero parity-check matrix entries replaced by randomly 
selected nonzero entries from the finite ring. The (155, 64) 
and (1055, 424) LDPC codes have parity-check matrix entries 
from Z 4 and GF(4), respectively, and the (204, 102) and 
(755, 334) LDPC codes have parity-check matrix entries from 
GF(8). The (155,64), (1055,424), and (755,334) matrices 
are (3, 5)-regular group-structured LDPC matrices; hence there 
are 5 nonzero entries in each row. For the (155,64) and 
(1055,424) matrices, we set all non-zero entries to 1 (= £°) 
in each row (£ is a primitive element of the finite field under 
consideration). For the (755, 334) LDPC matrix, the first, 
second, third, fourth, and fifth nonzero entry in each row is 
set to 1, C 2 , C 4 , C 6 , and 1, respectively. The (204, 102) LDPC 
matrix is a (3, 6)-regular matrix and we set the first, second, 
third, fourth, fifth, and sixth nonzero entry in each row to 
elements 1, £ 2 , £ 4 , £ 6 , and 1, respectively. 

Furthermore, we assume transmission over the AWGN 
channel where for the (155, 64) and (1055, 424) LDPC codes 
the nonbinary symbols are directly mapped to quaternary 
phase-shift keying (QPSK) signals and for the (204,102) 
and (755, 334) codes, nonbinary symbols are directly mapped 
to 8-PSK signals. We simulate up to 100 frame errors per 
simulation point. Unless otherwise specified, the maximum 
number of iterations is set to 100. 
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Fig. 7. Frame error rate for the (1055, 424) LDPC code over GF(4) under 
QPSK modulation. 

The error-correcting performance of the (155,64) LDPC 
code is shown in Figures [5] where the frame error rate (FER) 
and symbol error rate (SER) of the nonbinary basic LCLP 
decoding algorithm is compared with that of the MS algorithm. 
For this code, the FER of the nonbinary basic LCLP decoding 
algorithm is similar to that of the MS algorithm for low and 
moderate SNR levels; however, it is better by around 0.25dB 
for higher SNR levels. The SER of the nonbinary basic LCLP 
decoding algorithm is better than that of the MS algorithm 
for all tested SNR values. Figure [6] shows the average number 
of iterations required for the nonbinary basic LCLP and MS 
algorithms to converge during the decoding of the (155, 64) 
LDPC code. The nonbinary basic LCLP decoding algorithm 
requires around 10% to 15% more iterations than the MS 
algorithm to converge for lower SNR levels, whereas both 
algorithms require a similar number of iterations for moderate 
to high SNR values (i.e., in the waterfall region). Hence 
the nonbinary basic LCLP decoding algorithm outperforms 
the MS decoding algorithm in terms of the error-correcting 
performance for the (155, 64) LDPC code. 

The FER curves for the (1055,424) and the (204,102) 
LDPC codes are shown in Figures [7] and [8] respectively. In 
both cases the FER of nonbinary basic LCLP decoding is 
within 0.5dB to 0.75dB of that of the MS algorithm. Figure 
shows the FER curves for the (755, 334) LDPC code. Unlike 
the above mentioned results, here the FER performance of the 
nonbinary basic LCLP decoding algorithm is around 1.2dB 
worse than that of the MS algorithm for low to moderate 
SNR values. However, for SNR values higher then 9dB, the 
MS algorithm shows an error- floor effect and by lO.ldB 
its FER is the same as that of the nonbinary basic LCLP 
decoding algorithm. After 10.25dB, the nonbinary basic LCLP 
decoding algorithm also shows the error floor effect but still 
has better FER than the MS algorithm. The FER of nonbinary 
basic LCLP decoding algorithm at 10.25dB and 10.5dB was 
simulated for 60 frame errors per simulation point for this 
code. A similar phenomenon was also observed in [18] where 



the binary LCLP decoding algorithm outperformed the MS 
algorithm in the error-floor region. It is important to note that 
the binary (755, 334) LDPC code is constructed with the same 
algorithm as the other (3, 5) group-structured LDPC codes 
[30]; however its minimum distance is relatively low compared 
to other binary LDPC codes from the same family, and hence 
one can expect the binary MS (or SP) algorithm to show an 
error-floor effect. Our observation of a high error-floor for the 
(755, 334) LDPC code over GF(8) could be due to a similar 
problem with respect to the Lee distance. 

The FER of the (155, 64) LDPC code over Z 4 for the 
nonbinary subgradient LCLP decoding algorithm is shown 
in Figure |T0j The FER of the nonbinary basic LCLP de- 
coding algorithm is also shown here for reference, where 
the maximum number of iterations is set to 100. Both the 
constant and staircase type step-size rules are used for these 
simulations. Also, Figure [TT| shows the average number of 
iterations required for the nonbinary subgradient LCLP de- 
coding algorithm to converge, with different step-size rule 
combinations (maximum 100 iterations). The initial value 
of the step-size at the first iteration for the simulations of 
Figure [10] was optimized through simulation, and for constant 
step-size rule it is 0.08 where as for staircase type step-size 
rule it is 0.15. The nonbinary subgradient LCLP decoding 
algorithm with staircase type step-size rule has better FER than 
the constant step-size rule, while requiring a similar average 
number of iterations to converge. 

The FER of the nonbinary subgradient LCLP decoding algo- 
rithm with staircase type step-size rule is 0.38dB away from 
the FER of the nonbinary basic LCLP decoding algorithm 
for a maximum of 100 iterations and is better by 0.06dB for 
a maximum of 200 iterations. However, it requires approx- 
imately 3 times as many iterations on average to converge 
than the nonbinary basic LCLP decoding algorithm. As was 
already discussed in the previous section, the complexity of a 
single iteration of the nonbinary subgradient LCLP decoding 
algorithm is significantly lower than that of the nonbinary 
basic LCLP decoding algorithm. However, this complexity 
advantage is somewhat mitigated by the fact that the nonbi- 
nary subgradient LCLP decoding algorithm requires a higher 
number of iterations than the nonbinary basic LCLP decoding 
algorithm to reach a similar FER for a given SNR value. 

For the (204, 102) LDPC code, if the same step-size rule 
and maximum number of iterations is used, then the nonbinary 
subgradient LCLP decoding algorithm requires around 0.75dB 
more transmit power than the nonbinary basic LCLP decoding 
algorithm to reach same FER. 

For the (1055,424) and the (755,334) LDPC codes, the 
FER of the nonbinary subgradient LCLP decoding algorithm 
which uses the staircase type step-size rule (maximum 200 
iterations) is similar to or better than that of the nonbinary 
basic LCLP decoding algorithm (maximum 100 iterations). 
For these simulations, the initial value of the step-size (again 
optimized through simulation) for the (1055,424) code was 
0.20 and for the (755,334) code was 0.09. 

IX. Conclusions 

In this paper we generalized the basic LCLP decoding 
algorithm and the subgradient LCLP decoding algorithm to 
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Fig. 8. Frame error rate for the (204, 102) LDPC code over GF(8) under 
8-PSK modulation. 



Fig. 9. Frame error rate for the (755, 334) LDPC code over GF(8) under 
8-PSK modulation. 



nonbiliary linear codes. The complexity of the nonbinary 
LCLP decoding algorithms is linear in the code's block length 
and hence they can also be used for moderate and long 
block length codes. The complexity of nonbinary basic LCLP 
decoding algorithm is dominated by the maximum check node 
degree and the number of elements in the nonbinary alphabet. 
Furthermore, we proposed a modified BCJR algorithm for 
efficient check node processing in the nonbinary basic LCLP 
decoding algorithm. The proposed CN processing algorithm 
has complexity linear in the check node degree. We also pro- 
posed an alternative state metric which can be used to reduce 
the run time of the CN calculations of the nonbinary basic 
LCLP decoding algorithm. The error-correcting performance 
of the nonbinary basic LCLP decoding algorithm is similar to 
that of the MS algorithm for some classes of LDPC codes. 
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